Implementation

We have chosen to implement locks using regular files and index nodes. By using a unique naming algorithm, we are able to instantly `know' the name of a lock without having to search for it or ask a manager for the data. This minimizes time consuming calls to the network or to disk.

In order to secure a unique name, we need to provide enough information to be able to identify each atom uniquely. This is presently accomplished by passing a string to the lock function which can be combined with other elements such as the host on which the lock was created and any other relevant information. For convenience we classify atoms with an operator/operand pair. For example, consider a lock request to edit a file. In this case the operator would be ``edit'' and the operand would be the name of the file. The names must be processed to expunge unfortunate characters which would lead to illegal file names.

    CanonifyName(char *buffer)
    {
    for (sp = buffer; *sp != '\0'; sp++)
       {
       if (!isalnum(*sp))
          {
          *sp = '_';
          }
       }
    }

We use a function CanonifyName(name) which returns a string suitable for use as a filename. It suffices to swap illegal characters for an underscore, for instance.

In order to function properly, the lock-name must be different for each distinct atom, but must be constant over time so that multiple instantiations of the program will always find the same lock. The time information should therefore not be coded into the name of the lock; instead, one relies on the time stamps on the inodes to determine their age. For example, when editing the file /etc/motd on host dax, a lock named

 lock.cfengine_conf.dax.editfile._etc_motd

would be created.

We create two kinds of lock within the GetLock() call: a lock for active threads of execution which blocks multiple instantiations of a process, and a permanent lock which records the last time at which the resource was accessed. See figure [*]. The latter information can be encapsulated in a single inode without using any disk blocks and provides the information necessary to restrict the frequency of access. We call this an anti-spamming lock.

Figure: The adaptive lock components for an atom.
\begin{figure}
\centerline{
\begin{picture}(190,100)(0,0)
\put(100,80){\oval(...
...-2,-1){53}}
\put(100,59){\vector( 2,-1){53}}
\end{picture}
}
\end{figure}

If a lock already exists for a specified atom, and that lock has not expired, the atom remains locked and access to the atom is denied. Lock expiry occurs when a certain predefined period of time has elapsed since the active lock was created. In this case, a garbage collection mechanism attempts to carefully eliminate the process attached to the hanging lock (if it still exists) and then remove the old lock, replacing it with a new one and permitting the killing process to take over the task. The third possibility is that no active lock exists for an atom, but that the time since its previous execution is too short. This information is gleaned from the permanent lock. In that case access to the atom is also denied. This feature gives us the `anti-spamming' functionality.

Figure: A schematic illustration of the behaviour of locks with respect to the scheduling interval Δt and the parameters IfElapsed and ExpireAfter.
\begin{figure*}
\centerline{
\begin{picture}(400,213)(0,0)
\put(0,12){\vector...
...0,1){0.2}}
\put(315,188){\vector(0,-1){59}}
\end{picture}
}
\end{figure*}

A record of these lock transactions is kept for subsequent analysis if required. This indicates when and how locks were created and removed, thereby allowing problem cases, such as locks which always need to be removed forcibly, to be traced.

Figure: A schematic algorithm for implementing the locking policy. The function call GetLock() takes arguments which are used to build a unique name. Operator and operand pertain to the atom which is to be locked. The expiry time and elapsed time limits are times in minutes, and the now parameter is the system clock value for the time at which the lock is created. The function GetLastLock() creates the anti-spamming `last' lock if it does not previously exist. This is important for theoretical deadlock avoidance.
\begin{figure*}
\begin{verbatim}
GetLock(operator,operand,ifelapsed,expireaf...
...ning */
}
}
SetLock();
return true;
}\end{verbatim}
\end{figure*}

To make the locking behaviour user-configurable we introduce two parameters called ExpireAfter and IfElapsed, which have values in minutes. See figure [*]. ExpireAfter describes the number of minutes after creation at which a lock should expire. It is measured from the creation time-stamp on the active lock to the current value of the system clock. IfElapsed describes the number of minutes after which it becomes acceptable to execute the same atom again. It is measured from the modification time stamp of the anti-spamming lock to the current value of the system clock.

We choose to read the current time as a parameter to GetLock(), rather than reading it directly in the locking function, for the following reason. The most correct time to use here could be construed in one of two ways: it could be taken as being the time at which the program was started, or as the exact time at which the lock creation takes place. The difference between these times could differ by seconds, minutes or hours depending on the nature of the job being locked. By using the time at which the program was started for all locks throughout the program, one effectively treats a `pass' of the program as a cohesive entity: if one lock expires for a given value of ExpireAfter, they all expire. A certain ordering of atoms can be preserved. If, on the other hand, one always reads the present value of the system clock directly, the locking mechanism becomes sensitive to the length of time it has taken to execute the different parts of the program. Both policies might be desirable in different situations, so we do not see fit to impose any particular restriction on this.

When a lock has expired, we try to kill the owner process of the expired lock. The process ID of the expired process is read from the active lock. Then the signals CONT, INT, TERM and KILL are sent in that order. On some systems, INT is the only signal that will not hang the process permanently in case of a disk-wait situation, thus INT is sent first. Then the default terminate signal TERM is sent, and finally the non-ignorable signal KILL is sent. Sleep periods of several seconds separate these calls to give the kernel and program time to respond to the signals. The CONT signal is placed first in case the process has been suspended and wants to exit straight away. This should be harmless to non-suspended processes.